Billboard Difficulty Analysis

Author

John Ashley Burgoyne

This document presents the raw reliability analysis for Tom’s Billboard annotations. We use three item–response models: partial credit (PCM), generalised partial credit (GPCM), and extended partial credit (EPCM). In all cases, βn refers to the difficulty of song n, δik refers to the difficulty threshold where the probability of Tom choosing a score of k over a score of k – 1 is exactly 50–50 for criterion i. We report the results on a stanine scale.

The partial-credit model achieves a reliability of 0.74 for estimating song difficulties. That means that a difference of 3.1 stanines can be considered significant.

Criteria

Songs

The generalised partial-credit model achieves a reliability of 0.84 for estimating song difficulties. That means that a difference of 2.4 stanines can be considered significant.

Criteria

Songs

The extended partial-credit model achieves a reliability of 0.86 for estimating song difficulties. That means that a difference of 2.2 stanines can be considered significant.

Criteria

Songs

The generalised and extended models are statistically indistinguishable, but each of them performs better than the plain partial-credit model.

loo::loo_compare(
  list(
    pcm = pcm_fit$loo(), 
    gpcm = gpcm_fit$loo(), 
    epcm = epcm_fit$loo()
  )
)
     elpd_diff se_diff
epcm    0.0       0.0 
gpcm   -8.7       5.2 
pcm  -105.5      14.2